Create a Quarto file for ALL Lab 2 (no separate files for Parts 1 and 2).
Make sure your final file is carefully formatted, so that each analysis is clear and concise.
Be sure your knitted .html file shows all your source code, including any function definitions.
Part One: Identifying Bad Visualizations
If you happen to be bored and looking for a sensible chuckle, you should check out these Bad Visualisations. Looking through these is also a good exercise in cataloging what makes a visualization good or bad.
Dissecting a Bad Visualization
Below is an example of a less-than-ideal visualization from the collection linked above. It comes to us from data provided for the Wellcome Global Monitor 2018 report by the Gallup World Poll:
While there are certainly issues with this image, do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
Each country has a varying level of percentage of people who believe that vaccines are safe. Depending on the region which each country is in, there is an overall difference in percentage on average as well. Basically, percentage of vaccine acceptance depends on country as well as region.
List the variables that appear to be displayed in this visualization. Hint: Variables refer to columns in the data.
The variables that appear to be displayed is % of vaccine acceptance, country, and region.
Now that you’re versed in the grammar of graphics (e.g., ggplot), list the aesthetics used and which variables are mapped to each.
x: percentage of people who believe that vaccines are safe
y: country
color: region
What type of graph would you call this? Meaning, what geom would you use to produce this plot?
I would call it a scatterplot, but with jittered points for each country, so probably geom_jitter().
Provide at least four problems or changes that would improve this graph. Please format your changes as bullet points!
It is hard to compare countries in different regions with each other
The y-axis has no meaning
The legend is unnecessary since the labels are already there
There is too much going on and it takes too much time to understand the message
There are two worksheets in the downloaded dataset file. You may need to read them in separately, but you may also just use one if it suffices.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
here() starts at C:/Users/Allen/Desktop/STAT_541/lab-2
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.
plot <- final |>left_join(regions, by =join_by(country == country)) |>filter(region !="Other") |>mutate(region =factor(region,levels =c("Asia", "America", "Sub-Saharan Africa","Middle East / North Africa", "Europe", "Former Soviet Union"))) |>ggplot(aes(y = region, x = percentage,fill = region,group = region)) +geom_boxplot() +theme_minimal() +theme(legend.position ="none")ggplotly(plot)
For this second plot, you must select a plot that uses maps so you can demonstrate your proficiency with the leaflet package!
Select a data visualization in the report that you think could be improved. Be sure to cite both the page number and figure title. Do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
List the variables that appear to be displayed in this visualization.
Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.
What type of graph would you call this?
List all of the problems or things you would improve about this graph.
Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.
Third Data Visualization Improvement
For this third plot, you must use one of the other ggplot2 extension packages mentioned this week (e.g., gganimate, plotly, patchwork, cowplot).
Select a data visualization in the report that you think could be improved. Be sure to cite both the page number and figure title. Do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
List the variables that appear to be displayed in this visualization.
Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.
What type of graph would you call this?
List all of the problems or things you would improve about this graph.
Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.